Fast CRC Computation for Numeric Polynomials Using PCLMULQDQ Instruction

نویسندگان

  • Vinodh Gopal
  • Erdinc Ozturk
  • Jim Guilford
  • Gil Wolrich
  • Wajdi Feghali
  • Martin Dixon
چکیده

3 Executive Summary This paper presents a fast and efficient method of computing CRC on IA processors with generic polynomials using the carry-less multiplication instruction – PCLMULQDQ. Instead of reducing the entire message with traditional reduction algorithms, we use a faster folding approach to reduce an arbitrary length buffer to a small fixed size to be reduced further by traditional methods such as Barrett reduction. Parallelized folding approach is used to maximize the throughput of PCLMULQDQ instructions. We show how to do this efficiently for data buffers of arbitrary length. The final reduction part is only slightly different for different sized polynomials (e.g., a 32-bit CRC and a 16-bit CRC). With our novel folding methods, CRC computation using PCLMULQDQ is faster than best software routines that don't use the instruction, on a range of IA processor cores. This paper will enable customers to code and optimize any CRC application for maximum performance on Westmere. We use real examples in the paper to illustrate the methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Choosing a CRC polynomial and associated method for Fast CRC Computation on Intel® Processors white paper

Cyclic Redundancy Check (CRC) codes are widely used for integrity checking of data in fields such as storage and networking. Fast and efficient methods of computing CRC on Intel® processors have been proposed for the fixed (degree-32) iSCSI polynomial, using the CRC32 instruction introduced in the Intel® Core™ i7 Processors. In addition, the PCLMULQDQ instruction can be used for fast CRC comput...

متن کامل

Impact of Intel's New Instruction Sets on Software Implementation of GF(2)[x] Multiplication

PCLMULQDQ, a new instruction that supports GF(2)[x] multiplication, was introduced by Intel in 2010. This instruction brings dramatic change to software implementation of multiplication in GF(2m) fields. In this paper, we present improved Karatsuba formulae for multiplying two small binary polynomials, compare different strategies for PCLMULQDQbased multiplication in the five GF(2m) fields reco...

متن کامل

Fast parallel CRC algorithm and implementation on a configurable processor

-In this paper we present a fast cyclic redundancy check (CRC) algorithm that performs CRC computation for any length of message in parallel. For a given message with any length, we first chunk the message into blocks, each of which has a fixed size equal to the degree of the generator polynomial. Then we perform CRC computation among the chunked blocks in parallel using Galois Field multiplica...

متن کامل

A Symbolic-Numeric Software Package for the Computation of the GCD of Several Polynomials

This survey is intended to present a package of algorithms for the computation of exact or approximate GCDs of sets of several polynomials and the evaluation of the quality of the produced solutions. These algorithms are designed to operate in symbolic-numeric computational environments. The key of their effectiveness is the appropriate selection of the right type of operations (symbolic or num...

متن کامل

Faster Binary-Field Multiplication and Faster Binary-Field MACs

This paper shows how to securely authenticate messages using just 29 bit operations per authenticated bit, plus a constant overhead per message. The authenticator is a standard type of “universal” hash function providing information-theoretic security; what is new is computing this type of hash function at very high speed. At a lower level, this paper shows how to multiply two elements of a fie...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010